Two-class universal approximation example

In [42]:
## This code cell will not be shown in the HTML version of this notebook
# run animator for two-class classification fits
csvname = datapath + '2eggs_data.csv'
demo = nonlib.classification_basis_comparison_3d.Visualizer(csvname)

# run animator
demo.brows_single_fits(num_units =  [v for v in range(0,30,1)], basis = 'poly',view = [30,-80])
Out[42]:



Multiclass universal approximation example

This sort of trend holds for multiclass classification (and unsupervised learning problems as well), as illustrated in the example below. Here we have tuned $100$ single layer tanh neural network units minimizing the multiclass softmax cost to fit a toy $C=3$ class dataset. As you move the slider below from left to right weights from a run of $10,000$ steps of gradient descent are used, with steps from later in the run used as the slider is moved from left to right.

In [346]:
## This code cell will not be shown in the HTML version of this notebook
# load in dataset
csvname = datapath + '3_layercake_data.csv'
data = np.loadtxt(csvname,delimiter = ',')
x = data[:-1,:]
y = data[-1:,:] 

# import the v1 library
mylib8 = nonlib.library_v1.superlearn_setup.Setup(x,y)

# choose features
mylib8.choose_features(name = 'multilayer_perceptron',layer_sizes = [2,100,3],activation = 'tanh')

# choose normalizer
mylib8.choose_normalizer(name = 'standard')

# choose cost
mylib8.choose_cost(name = 'multiclass_softmax')

# fit an optimization
mylib8.fit(max_its = 10000,alpha_choice = 10**(-1))

# plot cost history
mylib8.show_histories(start = 10)

# load up animator
demo8 = nonlib.run_animators.Visualizer(csvname)

# pluck out a sample of the weight history
num_frames = 30 # how many evenly spaced weights from the history to animate

# animate based on the sample weight history
demo8.multiclass_animator(mylib8,num_frames,scatter = 'points',show_history = True)
Out[346]:



In [ ]:
 

All stump-based classification Part 1

This same phenomenon holds if we perform any other sort of learning - like classification. Below we use a set of stumps to perform two-class classification, training via gradient descent, on a realistic dataset that is reminiscent of the 'perfect' three dimensional classification dataset shown in the previous Subsection. As you pull the slider from left to right the tree-based model employs weights from further in the optimization run, and the fit gets better.

In [7]:
## This code cell will not be shown in the HTML version of this notebook
import copy
import sys
sys.path.append('../../')
from mlrefined_libraries import nonlinear_superlearn_library as nonlib
import autograd.numpy as np
datapath = '../../mlrefined_datasets/nonlinear_superlearn_datasets/'

# load in data
csvname = datapath + '2eggs_data.csv'
data = np.loadtxt(csvname,delimiter = ',')
x = data[:-1,:]
y = data[-1:,:] 

# import the v1 library
mylib7 = nonlib.library_v1.superlearn_setup.Setup(x,y)

# choose features
mylib7.choose_features(name = 'stumps')

# choose normalizer
mylib7.choose_normalizer(name = 'none')

# choose cost
mylib7.choose_cost(name = 'softmax')

# fit an optimization
mylib7.fit(max_its = 5000,alpha_choice = 10**(-2))
In [8]:
# plot
demo7 = nonlib.run_animators.Visualizer(datapath + '2eggs_data.csv')
frames = 10
demo7.animate_static_N2_simple(mylib7,frames,show_history = False,scatter = 'on',view = [30,-50])
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-8-608ba7c7f13d> in <module>()
      2 demo7 = nonlib.run_animators.Visualizer(datapath + '2eggs_data.csv')
      3 frames = 10
----> 4 demo7.animate_static_N2_simple(mylib7,frames,show_history = False,scatter = 'on',view = [30,-50])

/Users/Nurgetson/anaconda/lib/python3.5/site-packages/IPython/core/displayhook.py in __call__(self, result)
    256             self.start_displayhook()
    257             self.write_output_prompt()
--> 258             format_dict, md_dict = self.compute_format_data(result)
    259             self.update_user_ns(result)
    260             self.fill_exec_result(result)

/Users/Nurgetson/anaconda/lib/python3.5/site-packages/IPython/core/displayhook.py in compute_format_data(self, result)
    150 
    151         """
--> 152         return self.shell.display_formatter.format(result)
    153 
    154     # This can be set to True by the write_output_prompt method in a subclass

/Users/Nurgetson/anaconda/lib/python3.5/site-packages/IPython/core/formatters.py in format(self, obj, include, exclude)
    141             md = None
    142             try:
--> 143                 data = formatter(obj)
    144             except:
    145                 # FIXME: log the exception

<decorator-gen-9> in __call__(self, obj)

/Users/Nurgetson/anaconda/lib/python3.5/site-packages/IPython/core/formatters.py in catch_format_error(method, self, *args, **kwargs)
    186     """show traceback on failed format call"""
    187     try:
--> 188         r = method(self, *args, **kwargs)
    189     except NotImplementedError:
    190         # don't warn on NotImplementedErrors

/Users/Nurgetson/anaconda/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
    307             method = get_real_method(obj, self.print_method)
    308             if method is not None:
--> 309                 return method()
    310             return None
    311         else:

/Users/Nurgetson/Dropbox/mlrefined/mlrefined_libraries/JSAnimation_slider_only/IPython_display_slider_only.py in anim_to_html(anim, fps, embed_frames, default_mode)
     74             anim.save(f.name,  writer=HTMLWriter(fps=fps,
     75                                                  embed_frames=embed_frames,
---> 76                                                  default_mode=default_mode))
     77             html = open(f.name).read()
     78 

/Users/Nurgetson/anaconda/lib/python3.5/site-packages/matplotlib/animation.py in save(self, filename, writer, fps, dpi, codec, bitrate, extra_args, metadata, extra_anim, savefig_kwargs)
   1060                     for anim, d in zip(all_anim, data):
   1061                         # TODO: See if turning off blit is really necessary
-> 1062                         anim._draw_next_frame(d, blit=False)
   1063                     writer.grab_frame(**savefig_kwargs)
   1064 

/Users/Nurgetson/anaconda/lib/python3.5/site-packages/matplotlib/animation.py in _draw_next_frame(self, framedata, blit)
   1097         # post- draw, as well as the drawing of the frame itself.
   1098         self._pre_draw(framedata, blit)
-> 1099         self._draw_frame(framedata)
   1100         self._post_draw(framedata, blit)
   1101 

/Users/Nurgetson/anaconda/lib/python3.5/site-packages/matplotlib/animation.py in _draw_frame(self, framedata)
   1560         # Call the func with framedata and args. If blitting is desired,
   1561         # func needs to return a sequence of any artists that were modified.
-> 1562         self._drawn_artists = self._func(framedata, *self._args)
   1563         if self._blit:
   1564             if self._drawn_artists is None:

/Users/Nurgetson/Dropbox/mlrefined/mlrefined_libraries/nonlinear_superlearn_library/run_animators.py in animate(k)
    442             # plot surface / boundary
    443             if k > 0:
--> 444                 self.show_2d_classifier(ax,w_best,run)
    445                 self.show_3d_classifier(ax1,w_best,run)
    446             return artist,

/Users/Nurgetson/Dropbox/mlrefined/mlrefined_libraries/nonlinear_superlearn_library/run_animators.py in show_2d_classifier(self, ax, w_best, run, **kwargs)
    677         t = np.reshape(t,(np.size(t),1))
    678         h = np.concatenate((s,t),axis = 1)
--> 679         z = predict(normalizer(h.T),w_best)
    680         z = np.sign(z)
    681 

/Users/Nurgetson/Dropbox/mlrefined/mlrefined_libraries/nonlinear_superlearn_library/library_v1/cost_functions.py in model(self, x, w)
     54             f = self.feature_transforms(x,w[0])
     55         else:
---> 56             f = self.feature_transforms(x)
     57 
     58         # tack a 1 onto the top of each input point all at once

/Users/Nurgetson/Dropbox/mlrefined/mlrefined_libraries/nonlinear_superlearn_library/library_v1/stumps.py in feature_transforms(self, x)
     40 
     41                 ### our stump function f_u(x)
---> 42                 if x_n[dim] <= split:  # lies to the left - so evaluate at left level
     43                     x_transformed[u][pt] = level[0]
     44                 else:

KeyboardInterrupt: 

All stump-based classification Part 2

This same problem presents itself with all real supervised / unsupervised learning datasets. For example, if we take the two-class classification dataset shown in the third example of this Subsection and more completely tune the parameters of the same set of stumps we learn a model that - while fitting the training data we currently have even better than before - is far too flexible for future test data. Moving the slider one knotch to the right shows the result of a (nearly completely) optimized set of stumps trained to this dataset - with the resulting fit being extremely nonlinear (far to nonlinear for the phenomenon at hand).

In [505]:
## This code cell will not be shown in the HTML version of this notebook
# load in data
csvname = datapath + '2eggs_data.csv'
data = np.loadtxt(csvname,delimiter = ',')
x = data[:-1,:]
y = data[-1:,:] 

# import the v1 library
mylib12 = nonlib.library_v1.superlearn_setup.Setup(x,y)

# choose features
mylib12.choose_features(name = 'stumps')

# choose normalizer
mylib12.choose_normalizer(name = 'none')

# choose cost
mylib12.choose_cost(name = 'softmax')

# fit an optimization
mylib12.fit(optimizer = 'newtons method',max_its = 1)

# plot
demo12 = nonlib.run_animators.Visualizer(datapath + '2eggs_data.csv')
frames = 2
demo12.animate_static_N2_simple(mylib12,frames,show_history = False,scatter = 'on',view = [30,-50])
Out[505]:



In [ ]:
 

Stump-based regression

Below we repeat the experiment above only here we use $50$ stump units, tuning them to the data using $5000$ gradient descent steps. Once again as you move the slider a fit resulting from the certain step of gradient descent reflected on the cost function history is shown, and as you move from left to right the run progresses and the fit gets better.

In [219]:
## This code cell will not be shown in the HTML version of this notebook
# load in dataset
csvname = datapath + 'universal_regression_samples_0.csv'
data = np.loadtxt(csvname,delimiter = ',')
x = data[:-1,:]
y = data[-1:,:] 

# import the v1 library
mylib6 = nonlib.library_v1.superlearn_setup.Setup(x,y)

# choose features
mylib6.choose_features(name = 'stumps')

# choose normalizer
mylib6.choose_normalizer(name = 'none')

# choose cost
mylib6.choose_cost(name = 'least_squares')

# fit an optimization
mylib6.fit(max_its = 5000,alpha_choice = 10**(-2))

# load up animator
demo6 = nonlib.run_animators.Visualizer(csvname)

# pluck out a sample of the weight history
num_frames = 100 # how many evenly spaced weights from the history to animate

# animate based on the sample weight history
demo6.animate_1d_regression(mylib6,num_frames,scatter = 'points',show_history = True)
Out[219]:



In [ ]:
 

Example. Sinusoidal kernel approximators

Another classic sub-family of kernel universal approximators: sine waves of increasing frequency. This consists of the set of sine waves with frequency increasing by an e.g., integer factor like

$$f_1(x) = \text{sin}(x), ~~ f_2(x) = \text{sin}(2x), ~~ f_3(x) = \text{sin}(3x), ...$$

where the $m^{th}$ element given as $f_m(x) = \text{sin}(mx)$.

Below we plot the table of values for the first four of these catalog functions using their equations.

In [91]:
## This code cell will not be shown in the HTML version of this notebook
# build the first 4 non-constant polynomial basis elements
x = np.linspace(-5,5,100)
fig = plt.figure(figsize = (10,3))

for m in range(1,5):
    # make basis element
    fm = np.sin(m*x)
    fm_table = np.stack((x,fm),axis = 1)
    
    # plot the current element
    ax = fig.add_subplot(1,4,m)
    ax.plot(fm_table[:,0],fm_table[:,1],color = [0,1/float(m),m/float(m+1)],linewidth = 3)
    ax.set_title('$f_'+str(m) + ' = '  + '$sin$ ' +  '(' + str(m) + 'x)$',fontsize = 18)

    # clean up plot
    ax.grid(True, which='both')
    ax.axhline(y=0, color='k')
    ax.axvline(x=0, color='k')
plt.show()

As with the polynomials, notice how each of these catalog of elements if fixed. They have no tunable parameters inside, the third element always looks like $f_3(x) = \text{sin}(x)$ - that is it always takes on that shape. Also note, like polynomials to generalize this catalog of functions to higher dimensional input we shove each coordinate through the single dimensional version of the function separately. So in the case of $N=2$ inputs the functions take the form

\begin{equation} f_1(x_1,x_2) = \text{sin}(x1), ~~ f_1(x_1,x_2) = \text{sin}(2x_1)\text{sin}(5x_2), ~~ f_3(x_1,x_2) = \text{sin}(4x_1)\text{sin}(2x_2), ~~ f_4(x_1,x_2) = \text{sin}(7x_1)\text{sin}(x_2), ~~ ... \end{equation}

And these are listed in no particular order, and in general we can write a catalog element as $f_m(x_1,x_2) = \text{sin}(px_1)\text{sin}(qx_2) $ where $p$ and $q$ are any nonnegative integers.

We describe the kernel family in significantly more detail in Chapter 15.

Example. Relu example

Choosing another elementary function gives another sub-catalog of single-layer neural network functions. The rectified linear unit (or 'relu' for short) is another popular example, elements of which (for single dimensional input) look like

\begin{equation} f_1(x) = \text{max}\left(0,w_{1,0} + w_{1,1}x\right), ~~ f_2(x) = \text{max}\left(0,w_{2,0} + w_{2,1}x\right), ~~ f_3(x) = \text{max}\left(0,w_{3,0} + w_{3,1}x\right), ~~ f_4(x) = \text{max}\left(0,w_{4,0} + w_{4,1}x\right), ... \end{equation}

Since these also have internal parameters each can once again take on a variety of shapes. Below we plot $4$ instances of such a function, where in each case its internal parameters have been set at random.

In [104]:
## This code cell will not be shown in the HTML version of this notebook
# build 4 instances of a composition basis: line and tanh
x = np.linspace(-5,5,100)
fig = plt.figure(figsize = (10,3))

for m in range(1,5):
    # make basis element
    w_0 = np.random.randn(1)
    w_1 = np.random.randn(1)
    fm = np.maximum(0,w_0 + w_1*x)
    fm_table = np.stack((x,fm),axis = 1)
    
    # plot the current element
    ax = fig.add_subplot(1,4,m)
    ax.plot(fm_table[:,0],fm_table[:,1],c='r',linewidth = 3)
    ax.set_title('$f$ instance ' + str(m),fontsize = 18)

    # clean up plot
    ax.grid(True, which='both')
    ax.axhline(y=0, color='k')
    ax.axvline(x=0, color='k')

plt.show()

To handle higher dimensional input we simply take a linear combination of the input, passing the result through the nonlinear function. For example, an element $f_j$ for general $N$ dimensional input looks like the following using the relu function

\begin{equation} f_j\left(\mathbf{x}\right) = \text{max}\left(0,w_{j,0} + w_{j,1}x_1 + \cdots + w_{j,\,N}x_N\right). \end{equation}

As with the lower dimensional single layer functions, each such function can take on a variety of different shapes based on how we tune its internal parameters. Below we show $4$ instances of such a function with $N=2$ dimensional input.

In [47]:
## This code cell will not be shown in the HTML version of this notebook
# generate input values
s = np.linspace(-2,2,100)
x_1,x_2 = np.meshgrid(s,s)
degree_dict = {}

# build 4 polynomial basis elements
fig = plt.figure(num=None, figsize = (10,4), dpi=80, facecolor='w', edgecolor='k')

### plot regression surface ###
p =  [0,1,1,2]
q = [1,2,1,3]
for m in range(4):
    ax1 = plt.subplot(1,4,m+1,projection = '3d')
    ax1.set_axis_off()
    
    # random weights
    w_0 = np.random.randn(1)
    w_1 = np.random.randn(1)
    w_2 = np.random.randn(1)
    w_3 = np.random.randn(1)
    f_m = w_3*np.maximum(0,w_0 + w_1*x_1 + w_2*x_2)

    ax1.plot_surface(x_1,x_2,f_m,alpha = 0.35,color = 'w',zorder = 3,edgecolor = 'k',linewidth=1,cstride = 10, rstride = 10)
    ax1.view_init(20,40) 
    ax1.set_title('$f$ instance ' + str(m+1),fontsize = 18)

fig.subplots_adjust(left=0,right=1,bottom=0,top=1)   # remove whitespace around 3d figure
plt.show()

Example. Deeper network example

In [129]:
## This code cell will not be shown in the HTML version of this notebook
# build 4 instances of a composition basis: line and tanh and tanh
x = np.linspace(-5,5,100)
fig = plt.figure(figsize = (10,3))

for m in range(1,5):
    # make basis element
    fm = 0
    for j in range(10):
        w_0 = np.random.randn(1)
        w_1 = np.random.randn(1)
        w_3 = np.random.randn(1)
        fm+=w_3*np.tanh(w_0 + w_1*x)
    w_2 = np.random.randn(1)
    w_3 = np.random.randn(1)
    
    fm = np.tanh(w_2 + fm)
    fm_table = np.stack((x,fm),axis = 1)
    
    # plot the current element
    ax = fig.add_subplot(1,4,m)
    ax.plot(fm_table[:,0],fm_table[:,1],c='r',linewidth = 3,zorder = 3)
    ax.set_title('$f$ instance ' + str(m),fontsize = 18)

    # clean up plot
    ax.grid(True, which='both')
    ax.axhline(y=0, color='k')
    ax.axvline(x=0, color='k')

plt.show()

Example. Deeper trees example

To create a more flexible decision tree basis function we split each level of the stump. This gives us a tree of depth 2 (our first split gave us a stump, another phrase for stump is tree of depth 1). We can look at this mathematically / figuratively as in the figure below.

Figure 2: An illustration of a depth two tree function from the family of tree-based universal approimators. Here $V_1$, $V_2$, and $V_3$ are called *split points* and $y_1$ / $y_2$ / $y_3$ / $y_4$ the *levels* of the function.

This gives a basis element with four (potentially) distinct levels. Since the location of the splits and the values of the levels can be set in many ways, this gives each element of a tree basis of depth 2 a good deal more flexibility than stumps. Below we illustrate $4$ instances of a depth $2$ tree.

We describe trees in significantly further detail in Chapter 14.

In [54]:
## This code cell will not be shown in the HTML version of this notebook
# build 4 instances of a composition basis: line and tanh
x = np.linspace(-5,5,100)
fig = plt.figure(figsize = (10,3))

for m in range(1,5):
    # make basis element
    w_0 = 0.1*np.random.randn(1)
    w_1 = 0.1*np.random.randn(1)
    w_2 = np.random.randn(1)
    fm = w_2*np.sign(w_0 + w_1*x)
    
    # make basis element
    w_0 = 0.1*np.random.randn(1)
    w_1 = 0.1*np.random.randn(1)
    w_2 = np.random.randn(1)
    gm = w_2*np.sign(w_0 + w_1*x)
    
    # make basis element
    w_0 = 0.1*np.random.randn(1)
    w_1 = 0.1*np.random.randn(1)
    w_2 = np.random.randn(1)
    bm = w_2*np.sign(w_0 + w_1*x)
    fm += gm
    fm += bm
    fm_table = np.stack((x,fm),axis = 1)
    
    # plot the current element
    ax = fig.add_subplot(1,4,m)
    ax.scatter(fm_table[:,0],fm_table[:,1],c='r',s = 20,zorder = 3)
    ax.set_title('$f$ instance ' + str(m),fontsize = 18)

    # clean up plot
    ax.grid(True, which='both')
    ax.axhline(y=0, color='k')
    ax.axvline(x=0, color='k')

plt.show()
In [ ]: